Named-Entity Recognition in Novel Domains with External Lexical Knowledge
نویسندگان
چکیده
We investigate the adaptation of structured classifiers to new domains. In particular, the problem of using a supervised Named-Entity Recognition (NER) system on data from a different source than the training data. We present a Semi-Markov Model, trained with the perceptron algorithm, coupled with an external dictionary with the goal of improving generalization on the novel domain. Preliminary experiments show promising results, obtained with very simple additional features.
منابع مشابه
A Novel Approach to Conditional Random Field-based Named Entity Recognition using Persian Specific Features
Named Entity Recognition is an information extraction technique that identifies name entities in a text. Three popular methods have been conventionally used namely: rule-based, machine-learning-based and hybrid of them to extract named entities from a text. Machine-learning-based methods have good performance in the Persian language if they are trained with good features. To get good performanc...
متن کاملRobust Named Entity Recognition in Idiosyncratic Domains
Named entity recognition often fails in idiosyncratic domains. That causes a problem for depending tasks, such as entity linking and relation extraction. We propose a generic and robust approach for high-recall named entity recognition. Our approach is easy to train and offers strong generalization over diverse domainspecific language, such as news documents (e.g. Reuters) or biomedical text (e...
متن کاملبهبود شناسایی موجودیتهای نامدار فارسی با استفاده از کسره اضافه
Named entity recognition is a process in which the people’s names, name of places (cities, countries, seas, etc.) and organizations (public and private companies, international institutions, etc.), date, currency and percentages in a text are identified. Named entity recognition plays an important role in many NLP tasks such as semantic role labeling, question answering, summarization, machine ...
متن کاملCombining n-gram based statistics with traditional methods for named entity recognition
In this paper, we show three main results. First, we show that an n-gram dataset built from a large web crawl, as opposed to data from the specific target domain, can be used to perform the task of named entity recognition with reasonable accuracy. Second, we show that for complex domains, such as the MUC-7 NER task, the Lex method may not perform as well as other methods, due largely in part t...
متن کاملBiological Named Entity Recognition Using n-grams and Classification Methods
We propose a biological named entity recognition system which uses classification methods and a n-gram model to annotate terms in text. A novel method is presented to express lexical features in a pattern notation. Prefix and suffix characters are used instead of lists of potential terms or other external resources. Creating classification exemplars is conducted from text by using a word n-gram...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005